AITopics | time sery dataset

Collaborating Authors

time sery dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification

Chen, Xinyu, Cai, HanQin, Ding, Lijun, Zhao, Jinhua

arXiv.org Machine LearningMay-19-2026

We present TailedTS, a large-scale benchmark dataset derived from Wikipedia hourly page view observations throughout 2024, specifically designed to test time series forecasting models under heavy-tailed, zero-inflated, and non-Gaussian conditions. The dataset comprises approximately 24.69 billion data points spanning roughly 3 million unique Wikipedia pages per month, stored in high-efficiency Apache Parquet format. Wikipedia traffic follows a pronounced power-law distribution where roughly 5% of pages account for over 70% of total page views, creating a natural and rigorous testbed for model robustness against extreme volatility that are absent from or underrepresented in existing benchmarks such as M4, M5, and UCI electricity datasets. TailedTS enables several research tasks. First, we introduce a periodicity quantification framework based on sparse autoregression with sparsity and non-negativity constraints, revealing that frequently-viewed pages exhibit significantly weaker periodic structure than their less-viewed counterparts, showing direct implications for server allocation and traffic forecasting on large digital platforms. Second, we provide standardized prediction benchmarks evaluated under a suite of non-Gaussian loss functions, including $\ell_1$-norm, Huber, quantile, and $\ell_p$-norm losses, demonstrating that standard Gaussian-based estimators degrade substantially on high-volume page categories, while robust alternatives provide consistent gains across all traffic scales. TailedTS is publicly available at https://doi.org/10.5281/zenodo.17070469.

data mining, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.16361

Country:

North America > United States > California (0.46)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Power Industry (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

fc20ea8d104cab737a5561096f9bde9b-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-1-2026, 05:15:41 GMT

AQuAcurrently includes a variety of datasets for different classification problems, varying in the number of classes, sources of annotations, and data modalities. All datasets except those marked with are multi-class.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

A Appendix

Neural Information Processing SystemsFeb-18-2026, 03:00:34 GMT

The classes are airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks, and they are all mutually exclusive.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Florida > Broward County (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.46)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI Models

Alonso-Ramos, J. Javier, Aguilera-Martos, Ignacio, Herrera-Poyatos, Andrés, Herrera, Francisco

arXiv.org Artificial IntelligenceNov-14-2025

The performance of Machine Learning (ML) models, particularly those operating within the Interpretable Artificial Intelligence (Interpretable AI) framework, is significantly affected by the presence of noise in both training and production data. Denoising has therefore become a critical preprocessing step, typically categorized into instance removal and instance correction techniques. However, existing correction approaches often degrade performance or oversimplify the problem by altering the original data distribution. This leads to unrealistic scenarios and biased models, which is particularly problematic in contexts where interpretable AI models are employed, as their interpretability depends on the fidelity of the underlying data patterns. In this paper, we argue that defining noise independently of the solution may be ineffective, as its nature can vary significantly across tasks and datasets. Using a task-specific high quality solution as a reference can provide a more precise and adaptable noise definition. To this end, we propose DenoGrad, a novel Gradient-based instance Denoiser framework that leverages gradients from an accurate Deep Learning (DL) model trained on the target data -- regardless of the specific task -- to detect and adjust noisy samples. Unlike conventional approaches, DenoGrad dynamically corrects noisy instances, preserving problem's data distribution, and improving AI models robustness. DenoGrad is validated on both tabular and time series datasets under various noise settings against the state-of-the-art. DenoGrad outperforms existing denoising strategies, enhancing the performance of interpretable IA models while standing out as the only high quality approach that preserves the original data distribution.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.10161

Country: North America > United States (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.92)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TelecomTS: A Multi-Modal Observability Dataset for Time Series and Language Analysis

Feng, Austin, Varvarigos, Andreas, Panitsas, Ioannis, Fernandez, Daniela, Wei, Jinbiao, Guo, Yuwei, Chen, Jialin, Maatouk, Ali, Tassiulas, Leandros, Ying, Rex

arXiv.org Artificial IntelligenceOct-8-2025

Modern enterprises generate vast streams of time series metrics when monitoring complex systems, known as observability data. Unlike conventional time series from domains such as weather, observability data are zero-inflated, highly stochastic, and exhibit minimal temporal structure. Despite their importance, observability datasets are underrepresented in public benchmarks due to proprietary restrictions. Existing datasets are often anonymized and normalized, removing scale information and limiting their use for tasks beyond forecasting, such as anomaly detection, root-cause analysis, and multi-modal reasoning. To address this gap, we introduce TelecomTS, a large-scale observability dataset derived from a 5G telecommunications network. TelecomTS features heterogeneous, de-anonymized covariates with explicit scale information and supports a suite of downstream tasks, including anomaly detection, root-cause analysis, and a question-answering benchmark requiring multi-modal reasoning. Benchmarking state-of-the-art time series, language, and reasoning models reveals that existing approaches struggle with the abrupt, noisy, and high-variance dynamics of observability data. Our experiments also underscore the importance of preserving covariates' absolute scale, emphasizing the need for foundation time series models that natively leverage scale information for practical observability applications.

data mining, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.06063

Genre: Research Report > New Finding (0.46)

Industry:

Telecommunications > Networks (1.00)
Information Technology > Networks (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning Soft Sparse Shapes for Efficient Time-Series Classification

Liu, Zhen, Luo, Yicheng, Li, Boyuan, Eldele, Emadeldeen, Wu, Min, Ma, Qianli

arXiv.org Artificial IntelligenceJun-4-2025

Shapelets are discriminative subsequences (or shapes) with high interpretability in time series classification. Due to the time-intensive nature of shapelet discovery, existing shapelet-based methods mainly focus on selecting discriminative shapes while discarding others to achieve candidate subsequence sparsification. However, this approach may exclude beneficial shapes and overlook the varying contributions of shapelets to classification performance. To this end, we propose a Soft sparse Shapes (SoftShape) model for efficient time series classification. Our approach mainly introduces soft shape sparsification and soft shape learning blocks. The former transforms shapes into soft representations based on classification contribution scores, merging lower-scored ones into a single shape to retain and differentiate all subsequence information. The latter facilitates intra- and inter-shape temporal pattern learning, improving model efficiency by using sparsified soft shapes as inputs. Specifically, we employ a learnable router to activate a subset of class-specific expert networks for intra-shape pattern learning. Meanwhile, a shared expert network learns inter-shape patterns by converting sparsified shapes into sequences. Extensive experiments show that SoftShape outperforms state-of-the-art methods and produces interpretable results.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2505.06892

Country:

Asia > China (0.28)
North America (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Efficient Model Maintenance Approach for MLOps

Majidi, Forough, Khomh, Foutse, Li, Heng, Nikanjam, Amin

arXiv.org Artificial IntelligenceDec-5-2024

In recent years, many industries have utilized machine learning models (ML) in their systems. Ideally, machine learning models should be trained on and applied to data from the same distributions. However, the data evolves over time in many application areas, leading to data and concept drift, which in turn causes the performance of the ML models to degrade over time. Therefore, maintaining up to date ML models plays a critical role in the MLOps pipeline. Existing ML model maintenance approaches are often computationally resource intensive, costly, time consuming, and model dependent. Thus, we propose an improved MLOps pipeline, a new model maintenance approach and a Similarity Based Model Reuse (SimReuse) tool to address the challenges of ML model maintenance. We identify seasonal and recurrent distribution patterns in time series datasets throughout a preliminary study. Recurrent distribution patterns enable us to reuse previously trained models for similar distributions in the future, thus avoiding frequent retraining. Then, we integrated the model reuse approach into the MLOps pipeline and proposed our improved MLOps pipeline. Furthermore, we develop SimReuse, a tool to implement the new components of our MLOps pipeline to store models and reuse them for inference of data segments with similar data distributions in the future. Our evaluation results on four time series datasets demonstrate that our model reuse approach can maintain the performance of models while significantly reducing maintenance time and costs. Our model reuse approach achieves ML performance comparable to the best baseline, while being 15 times more efficient in terms of computation time and costs. Therefore, industries and practitioners can benefit from our approach and use our tool to maintain the performance of their ML models in the deployment phase to reduce their maintenance costs.

artificial intelligence, dataset, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2412.04657

Country:

Europe > Switzerland > Zürich > Zürich (0.07)
North America > Canada > Quebec (0.04)
Oceania > Australia > New South Wales (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Services (0.46)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis

Gkillas, Alexandros, Lalos, Aris

arXiv.org Artificial IntelligenceNov-6-2024

Anomaly and missing data constitute a thorny problem in industrial applications. In recent years, deep learning enabled anomaly detection has emerged as a critical direction, however the improved detection accuracy is achieved with the utilization of large neural networks, increasing their storage and computational cost. Moreover, the data collected in edge devices contain user privacy, introducing challenges that can be successfully addressed by the privacy-preserving distributed paradigm, known as federated learning (FL). This framework allows edge devices to train and exchange models increasing also the communication cost. Thus, to deal with the increased communication, processing and storage challenges of the FL based deep anomaly detection NN pruning is expected to have significant benefits towards reducing the processing, storage and communication complexity. With this focus, a novel compression-based optimization problem is proposed at the server-side of a FL paradigm that fusses the received local models broadcast and performs pruning generating a more compressed model. Experiments in the context of anomaly detection and missing value imputation demonstrate that the proposed FL scenario along with the proposed compressed-based method are able to achieve high compression rates (more than $99.7\%$) with negligible performance losses (less than $1.18\%$ ) as compared to the centralized solutions.

edge device, scenario, sensor, (14 more...)

arXiv.org Artificial Intelligence

2411.03996

Country: Europe > Greece (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Smart Houses & Appliances (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Hypergraph-based multi-scale spatio-temporal graph convolution network for Time-Series anomaly detection

Xu, Hongyi

arXiv.org Artificial IntelligenceOct-29-2024

Multivariate time series anomaly detection technology plays an important role in many fields including aerospace, water treatment, cloud service providers, etc. Excellent anomaly detection models can greatly improve work efficiency and avoid major economic losses. However, with the development of technology, the increasing size and complexity of data, and the lack of labels for relevant abnormal data, it is becoming increasingly challenging to perform effective and accurate anomaly detection in high-dimensional and complex data sets. In this paper, we propose a hypergraph based spatiotemporal graph convolutional neural network model STGCN_Hyper, which explicitly captures high-order, multi-hop correlations between multiple variables through a hypergraph based dynamic graph structure learning module. On this basis, we further use the hypergraph based spatiotemporal graph convolutional network to utilize the learned hypergraph structure to effectively propagate and aggregate one-hop and multi-hop related node information in the convolutional network, thereby obtaining rich spatial information. Furthermore, through the multi-scale TCN dilated convolution module, the STGCN_hyper model can also capture the dependencies of features at different scales in the temporal dimension. An unsupervised anomaly detector based on PCA and GMM is also integrated into the STGCN_hyper model. Through the anomaly score of the detector, the model can detect the anomalies in an unsupervised way. Experimental results on multiple time series datasets show that our model can flexibly learn the multi-scale time series features in the data and the dependencies between features, and outperforms most existing baseline models in terms of precision, recall, F1-score on anomaly detection tasks. Our code is available on: https://git.ecdf.ed.ac.uk/msc-23-24/s2044819

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.22256

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (1.00)
Water & Waste Management > Water Management > Lifecycle > Treatment (0.66)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Language Model-Guided Framework for Mining Time Series with Distributional Shifts

Zhu, Haibei, El-Laham, Yousef, Fons, Elizabeth, Vyetrenko, Svitlana

arXiv.org Artificial IntelligenceJun-7-2024

Effective utilization of time series data is often constrained by the scarcity of data quantity that reflects complex dynamics, especially under the condition of distributional shifts. Existing datasets may not encompass the full range of statistical properties required for robust and comprehensive analysis. And privacy concerns can further limit their accessibility in domains such as finance and healthcare. This paper presents an approach that utilizes large language models and data source interfaces to explore and collect time series datasets. While obtained from external sources, the collected data share critical statistical properties with primary time series datasets, making it possible to model and adapt to various scenarios. This method enlarges the data quantity when the original data is limited or lacks essential properties. It suggests that collected datasets can effectively supplement existing datasets, especially involving changes in data distribution. We demonstrate the effectiveness of the collected datasets through practical examples and show how time series forecasting foundation models fine-tuned on these datasets achieve comparable performance to those models without fine-tuning.

dataset, distributional shift, time sery, (13 more...)

arXiv.org Artificial Intelligence

2406.05249

Country:

North America > United States (0.15)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance > Economy (0.93)
Health & Medicine (0.88)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback